## Indlæser krævet pakke: stringi
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.1.0 ✓ forcats 0.5.1
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Indlæser krævet pakke: leaflet
##
## Vedhæfter pakke: 'leafletDK'
## De følgende objekter er maskerede fra 'package:mapDK':
##
## parish, rural, zip
The data had been downloaded with an API from “Danmarkstatistik” into OpenRefine and cleaned. I will the rest of the modifying here in RStudio.
I load the data from the year 2007, of number of people, aged 18, who have moved to Copenhagen. This is then specifies in which municipality they come from. Then I check the first 6 rows to confirm it looks correct and to see the specifications of the colloum.
data07 <- read_csv("aar2007.csv",show_col_types = FALSE)
head(data07)
## # A tibble: 6 × 4
## TID FRAKOMMUNE ALDER INDHOLD
## <dbl> <chr> <chr> <dbl>
## 1 2007 Koebenhavn 18 år 0
## 2 2007 Frederiksberg 18 år 63
## 3 2007 Dragoer 18 år 10
## 4 2007 Taernby 18 år 41
## 5 2007 Albertslund 18 år 10
## 6 2007 Ballerup 18 år 21
I here want to see how big a percentage of the combined municipality population they people who moves away constitute. I again use an API from Danmarkstatistik to find the total population of 18 year old from each municipality.
aar1808 <- read_csv("antal18.csv", show_col_types = FALSE)
head(aar1808)
## # A tibble: 6 × 2
## OMRÅDE INDHOLD
## <chr> <dbl>
## 1 Koebenhavn 4042
## 2 Frederiksberg 648
## 3 Dragoer 142
## 4 Taernby 471
## 5 Albertslund 435
## 6 Ballerup 541
I merge the two different datasets with the mutate function, as we can see below, the data07 set now has 5 variables
data07 %>%
mutate(Total18=aar1808$INDHOLD) -> data07
head(data07)
## # A tibble: 6 × 5
## TID FRAKOMMUNE ALDER INDHOLD Total18
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 2007 Koebenhavn 18 år 0 4042
## 2 2007 Frederiksberg 18 år 63 648
## 3 2007 Dragoer 18 år 10 142
## 4 2007 Taernby 18 år 41 471
## 5 2007 Albertslund 18 år 10 435
## 6 2007 Ballerup 18 år 21 541
data07 %>%
mutate(procent = (INDHOLD/Total18)*100) -> data07
head(data07)
## # A tibble: 6 × 6
## TID FRAKOMMUNE ALDER INDHOLD Total18 procent
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2007 Koebenhavn 18 år 0 4042 0
## 2 2007 Frederiksberg 18 år 63 648 9.72
## 3 2007 Dragoer 18 år 10 142 7.04
## 4 2007 Taernby 18 år 41 471 8.70
## 5 2007 Albertslund 18 år 10 435 2.30
## 6 2007 Ballerup 18 år 21 541 3.88
#The new colloum i add to the existing dataset since i want to keep working with it.
I use the “mutate” function because i wan’t to use data that already have included in my sheet. I use the mutate to make a new colloum that shows the percentage of the population that moves away.
Since i use a custom and very specific packages, the minicipalityDK and mapDK, they cannot read my data since the municipality names must match up 100%. I already cleaned up the special charracters in OpenRefine, but now i need to rename the colloums as well, and make them into lowercase.
#for 2007
data07 %>%
mutate(FRAKOMMUNE = tolower(FRAKOMMUNE)) -> data07
data07 %>%
rename(kommune = FRAKOMMUNE) -> data07
head(data07)
## # A tibble: 6 × 6
## TID kommune ALDER INDHOLD Total18 procent
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2007 koebenhavn 18 år 0 4042 0
## 2 2007 frederiksberg 18 år 63 648 9.72
## 3 2007 dragoer 18 år 10 142 7.04
## 4 2007 taernby 18 år 41 471 8.70
## 5 2007 albertslund 18 år 10 435 2.30
## 6 2007 ballerup 18 år 21 541 3.88
I here use mapDK to create a graph that shows the promille who moved. This is visuallied with dark blue as low values, and lighter is higher values.
kommunekort1 <- mapDK(values = 'procent', id = 'kommune', data = data07)
## Warning in mapDK(values = "procent", id = "kommune", data = data07): Some id not
## recognized: taernby
## Warning in mapDK(values = "procent", id = "kommune", data = data07): You
## provided no data for the following ids: taarnby
kommunekort1
Since i don’t have the data for christiansoe and taarnby i will get a warning since they can’t be included then. Also i have no value for Copenhagen, since you can’t move from and too the same.
I find that this isn’t very easy to read, no i will now try with the municipalityDK.
This way it becomes a little easier to read, and then the map is also interactive now, you can click on the municipalities and see the given value.
kommunekort2 <- municipalityDK("procent", "kommune", data = data07, legend=T,pal = "GnBu") %>%
setMapWidgetStyle(list(background= "white"))
## Indlæser krævet pakke: sp
## Missing values for Christiansø
## Missing values for Tårnby
kommunekort2
I wan’t to make the contrast of colours even more clear and the map easier to use and understand.
kommunekort3 <- municipalityDK("procent", "kommune", data = data07, legend=T,pal = colfunc(10)) %>%
setMapWidgetStyle(list(background= "white"))
## Missing values for Christiansø
## Missing values for Tårnby
kommunekort3
The contrast now is from blue to red, and therfor much easier to undersand and read.
I see that Frederiksberg i the municipality with the highest values. Now i want to examine if that has changed over time.
I use the Danmarkstatistik API again and load the data:
fre18 <- read_delim("https://api.statbank.dk/v1/data/FLY66/CSV?delimiter=Semicolon&TILKOMMUNE=101&FRAKOMMUNE=147&ALDER=18&Tid=*",show_col_types = FALSE)
I then want to do a simple plot with ggplot to see if this is normal values or an outlier.
ggplot(fre18) + aes(x = TID, y = INDHOLD, colour = "red") + geom_path()
I can here see that the year 2008, isn’t an outlier and the fact that for Frederiksberg is the most popuplar place to move is Copenhagen, seems very likely.
Then to test the relation between Copenhagen and the souther part of Sjælland:
lol18 <- read_delim("https://api.statbank.dk/v1/data/FLY66/CSV?delimiter=Semicolon&TILKOMMUNE=101&FRAKOMMUNE=360&ALDER=18&Tid=*",show_col_types = FALSE)
I then want to do a simple plot with ggplot to see if this is normal values or an outlier.
ggplot(lol18) + aes(x = TID, y = INDHOLD, colour = "red") + geom_path()
The graph shows a very simular development to the Frederiksberg graph, and indicates a decline up towards 2020.
Finally i want to test that my project is reproducable and i also wont to make my kommunekort3 comparabel with another visualization. Therefor i want to make the same map, but with data for 2017, 10 years later, to see if there are changes.
aar17 <- read_csv("aar1817ny.csv",show_col_types = FALSE)
data17 <- read_csv("Data17.csv",show_col_types = FALSE)
data17 %>%
mutate(Total18=aar17$INDHOLD) -> data17
#for 2017
data17 %>%
mutate(FRAKOMMUNE= tolower(FRAKOMMUNE)) -> data17
data17 %>%
rename(kommuner = FRAKOMMUNE ) -> data17
data17 %>%
mutate(procent = (INDHOLD/Total18)*100) -> data17
#The new colloum i add to the existing dataset since i want to keep working with it.
kommunekort4 <- municipalityDK("procent", "kommuner", data = data17, legend=T,pal = colfunc(10)) %>%
setMapWidgetStyle(list(background= "white"))
## Missing values for Tårnby
I can easily reproduce my project with new data.
kommunekort4
And for comparrison with my original map for 2007.
kommunekort3